Fast computation of Lyot-style 
coronagraph propagation 



R. Soummer* 1 2 , L. Pueyo 3 , 
A. Sivaramakrishnan 1 2 4 , and R. J. Vanderbei 5 

1 Department of Astrophysics, American Museum of Natural History, New York, NY 10024 
2 Center for Adaptive Optics, University of California, Santa Cruz, CA 95064 
- 'Mechanical and Aerospace Engineering, Princeton University, Princeton, NJ 08544 
^Department of Physics and Astronomy, Stony Brook University, Stony Brook, NY 11794 
5 Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544 

corresponding author: 

rsoummer® amnh. org 



Abstract: We present a new method for numerical propagation through 
Lyot-style coronagraphs using finite occulting masks. Standard methods for 
coronagraphic simulations involve Fast Fourier Transforms (FFT) of very 
large arrays, and computing power is an issue for the design and tolerancing 
of coronagraphs on segmented Extremely Large Telescopes (ELT) in order 
to handle both the speed and memory requirements. Our method combines a 
semi-analytical approach with non-FFT based Fourier transform algorithms. 
It enables both fast and memory-efficient computations without introducing 
any additional approximations. Typical speed improvements based on 
computation costs are of about ten to fifty for propagations from pupil to 
Lyot plane, with thirty to sixty times less memory needed. Our method 
makes it possible to perform numerical coronagraphic studies even in the 
case of ELTs using a contemporary commercial laptop computer, or any 
standard commercial workstation computer. 
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1. Introduction 

The field of high contrast imaging has expanded rapidly in the past few years, driven by the 
prospect of science enabled by the direct detection and characterization of extrasolar planets 
and faint circumstellar disks. Several observatories have launched studies and development of 
such projects, e.g. Gemini Planet Imager (GPI) Q, SPHERE 0, or HiCIAO 0. In the future, 
Extremely Large Telescopes (ELT) [4 5,6] will provide the higher angular resolution necessary 
to study more distant planet forming regions, and also faint old giant planets. The study of 
Earth-like planets will probably have to wait for space-based instruments [7,81. Ground-based 
imaging requires the association of coronagraphy and extreme adaptive optics (ExAO), which 
can be studied both theoretically and numerically j^ [T0l[TTl[T2l[T3l . 

In this paper we study the numerical modeling of Lyot type coronagraphs IT4l . including 
Apodized Pupil Lyot Coronagraphs (APLC) US [IS H21 [HI , or phase masks QUE}!. These 
coronagraphs consist of the succession of binary filters (pupil, occulting mask, Lyot stop). The 
method we describe does not provide any significant improvement in the case of infinite-size 
focal plane masks l2Tll22ll23ll24l . which we do not discuss further here. Under the classical 
approximations of Fourier Optics l25l . a Fourier Transform (FT) relationship exists between the 
field amplitude at two successive planes of the coronagraph. The pupil, with its finite support, is 
not band-limited and so cannot be perfectly sampled according to the Shanon-Nyquist sampling 
theorem ll26l . The same problem pertains to the occulting mask in the focal plane. A remedy is 
to impose very fine sampling both in pupil and focal planes, in order to represent small features 
(e.g. segmentation or secondary mirror support structures, occulting spot). Note that Nyquist 
sampling of the focal plane with two pixels per resolution element (X/D), is not sufficient 
here. Indeed, the small occulting mask has a typical size of about 5 A /D, and would not be well 
represented by a disk ten pixels in diameter. This is even more problematic for phase masks of 
size ~ X/D lfT9ll20ll27l . Significant oversampling of the PSF is therefore required for sufficient 
fidelity in coronagraphic modeling. 



This two-fold sampling requirement in two Fourier-conjugate planes is not only a severe 
practical problem, but raises a fundamental problem associated with the uncertainty principle. 
In order to preserve complete information in the FT, a fine sampling in one domain means that 
the corresponding FT must be calculated to a very high frequency in the reciprocal domain. 
This is achieved optically when forming successive pupil and focal plane images. Numerically, 
this can be reproduced using Fast Fourier Transforms (FFT), but this can lead to overwhelming 
demands on memory and computing power. In Secj3]we show that this fundamental limitation 
makes classical FFTs poorly suited for Lyot-type coronagraphic computations. In SecH] we 
analyze the analytical formalism of coronagraphic propagation, and conclude that a partial FT 
algorithm providing arbitrary sampling in a limited area is more appropriate for coronagraphy. 
Several methods can be used for this purpose ll28l|29l , and we detail a matrix-based FT |3Q| . 
which can be implemented easily in any high-level language. 

Comparisons between classical propagation with FFTs and our semi-analytical method show 
a considerable gain both in speed and memory requirements. This therefore opens new possi- 
bilities for coronagraphic studies hitherto impossible without large computer cluster facilities. 
Among them, the study of extremely fine sampling of the pupil plane in the case of ELTs, of 
extremely fine sampling of the occulting plane mask, or of end-to-end simulations of an ExAO 
coronagraph to simulate a long exposure, are achievable with standard desktop equipment. 

2. Lyot-type Coronagraphs 

In this section we recall the general formalism of Lyot-type coronagraphs, following the nota- 
tion of Aime et al. ifTBl and Soummer et al. lfl6l[T8l . The general layout is given in FigQ] The 
setup consists of an ensemble of apodizers, masks and stops in four successive planes A,B,C,D, 
respectively, where A is the entrance aperture, B is the focal plane with the occulting mask, C 
is an image of the entrance aperture where a pupil mask called Lyot stop is placed and D is 
the final image plane. We will consider the usual approximations of paraxial optics |25l . and 
that the optical layout is properly designed to cancel the quadratic phase terms associated with 
Fresnel propagation, so that a FT relationship exists between two successive planes. The tele- 
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Fig. 1. Illustration of the four coronagraphic planes: the pupil corresponds to Plane A (pos- 
sibly apodized). A focal masks (hard-edged, or phase mask) is placed in the focal plane B, 
and a Lyot stop (possibly undersized) in plane C. 

scope aperture function with the position vector r = (x,y) is denoted by P(r) (index function 
equal to 1 inside the aperture 3?). This aperture can be apodized by a function < t > (r). Note 
that these functions do not have to be radial. A mask of transmission 1 — eM(r) is placed in 
the focal plane. M is the index function that describes the mask shape equal to 1 inside 
the coronagraphic mask and outside. L is the index function of the Lyot stop. We recall that 
both the Lyot coronagraph (opaque mask) and the Roddier coronagraph (n phase mask) can 



be described by this common formalism, with e = 1 for Lyot and e = 2 for Roddier using that 
e' 71 = — 1, as detailed in |[T5ll27l . The field amplitude in the four successive planes are: 



<F A (r) = P(r)4>(r) (1) 

«F B (r) = $ A (r)(l-£M(r)) (2) 

«F c (r) = (¥ A (r)-e¥ A (r)*M(r))L(r) (3) 

%(r) = ($ A (r)-e$ A (r)M(r))*L(r) (4) 



In these equations,^ denotes the Fourier Transform, * the convolution product, and we assume 
that the focal lengths of the successive optical systems are identical (if not, an appropriate 
change of variables leads to the same result). Also, we re-orientate the axis in the opposite di- 
rection and omit the coordinate reversal for better legibility here. Also, we assumed monochro- 
matic propagation in these equations. The effect of a finite spectral bandpass can be added 
easily to this formalism: in the focal plane, the size of the point spread function is proportional 
to the wavelength, but the mask (hard-edged or phase) has a fixed size. It can be readily shown 
lfT5l[T6l[3T| that changing the wavelength is formally equivalent to changing the mask size. 

3. Classical numerical propagation and FFTs 

In order to compute the coronographic propagation, we are interested in the numerical evalua- 
tion of the Fourier integral between each plane. Without loss of generality we consider the case 
of a one-dimensional signal f(x) with x G [— yD/2, jD/2] and f(x) = for \x\ > D/2, where y 
is a padding coefficient that will be later related to the resolution of the FT in the image plane. 
We illustrate the relationship between plane A and plane B. We first recall classical results of 
Fourier analysis [28|; the sampled Continuous Fourier Transform (CFT) can be written as a 
Riemann sum: 

F{u k ) = / f{x)e- mxu Hx (5) 

J-yD/2 

~ 8x\ f{x n )e- anx » u * (6) 

n=0 

where x n = (n — yNa/2)8x, n G [0, jNa — 1], corresponding to the sampling points of f(x), and 
Uk = (k — Nb/2)8u, k G [0,Nb — 1] are the sampled values of the Fourier transform. Na is the 
number of pixels along the pupil diameter D, and is the number of pixels in the focal plane. 
Note that under the Riemann sum approximation, independent sampling grids can be chosen in 
both domains. However, when one chooses the same size = jNa = Nb for both arrays, and 
the integration steps as: 

8x 8u = —. (7) 

then Eq. |6]greatly simplifies to: 

F{u k ) = ^(-lf/^ N f(-l)"f( Xn ) e -^"/ N , (8) 
n n=0 

where the sum is now a Discrete Fourier Transform (DFT), which can be computed very ef- 
ficiently using FFT algorithms. The number of floating point operations (flops) in a radix-2 
Cooley-Tukey FFT of size is 5A^ log 2 N. It is generally assumed that this is also an approxi- 
mation for the number of operations in any complex FFT If32l . 



It is critical to realize that this possibility of using FFTs is obtained at the expense of a 
fixed relationship between the focal plane sampling and the padding factor, forced by Eq. [7] 
8u = X/ (yD), as illustrated on Fig. [2] Under this condition, the occulting mask of m resolution 
elements is therefore sampled with ym pixels. Because stellar coronagraphs use very small 
masks (typically 4 ~ 5X/D for an APLC, and X/D for a PM or DZPM), decent sampling of 
these masks (at the very least a few tens of pixels) imposes large zero padding factors y. A 
general consensus for coronagraphic calculation is to use y = 6 or y = 8. The memory and 
computational speed problems associated with classical coronographic algorithms stem from 
the requirement of finely sampling the image plane mask. This can lead to prohibitively large 
padding factors for some applications which can benefit for particularly fine sampling, such as 
tip-tilt tolerancing l33l . study of atmospheric differential refraction effects, mask ellipticity or 
roughness studies. 

A common method to mitigate these mask sampling problems is to use a gray-pixel approx- 
imation, where gray pixels are used at the edge of the mask: their value is defined as the ratio 
of the area covered by the mask to the total pixel area. This technique corresponds to a slight 
numerical apodization of the focal plane mask, akin to the use of an anti-alizing filter. The same 
approach can be used for small features in the pupil plane, such as secondary mirror support 
structures. However, there are cases where the gray approximation cannot be used. For example, 
the tolerancing of the effects of the small pupil features (segmentation, spiders 1341 ), and their 
mitigation by the Lyot Stop cannot be done with gray pixel approximation. Prolate apodizers 
for APLCs need to be defined without gray pixels in the pupil fl"8l . In the case of phase masks 
(Roddier |fl9l or Dual Zone l20l ). the gray approximation is meaningless for the phase mask, 
and large padding coefficients have to be used since these masks are very small (typically one 
resolution element). 

4. Semi-analytical coronagraphic propagations 

4.1. Principle 

As shown in Fig. [2] FFT methods address the image plane sampling requirement by zero 
padding the pupil, and mimicking the optical propagation, preserving the complete informa- 
tion between planes. A much better approach can be derived from the analytical expression of 
the field in the Lyot plane, simply re-writing Eq.[3]as: 

V c (r) = (VA(r)-eJ?[J?\V A (r)]M(r)])L(r), (9) 

where we can readily identify that the first FT of the pupil field amplitude jF^a (r)] is truncated 
by the occulting spot M(r), and that the second FT is truncated by the Lyot Stop L(r). This 
means that we are only interested in the knowledge of the FTs inside limited areas, viz., the 
limited occulting mask area, and the limited Lyot stop area. We can thus completely circumvent 
the sampling problem by restricting the information of the FTs to these two zones: the semi- 
analytical approach consists of computing these limited-area FTs numerically, and subtracting 
the result from the pupil complex amplitude, according Eq. [9] 

In the case of one-dimensional problems (rectangular apertures [15], or perfect circular aper- 
tures Ifl6ll2"0l ). the semi-analytical approach can be applied by calculating directly these two 
limited-area FTs (or Hankel Transforms for circular apertures) using a one-dimensional numer- 
ical integration algorithm. This approach presents some advantages for phase masks calcula- 
tions l20l . but such direct integrations cannot be used efficiently in the two-dimensional case 
for computation reasons. 

In the general two-dimensional case, the two limited-area FTs can be calculated using partial 
FT methods. Several methods exist to calculate such partial FTs, such as the Fractional Fourier 
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Fig. 2. Sampling relationship between pupil (left column) and image plane (center), using 
FFTs. The padded pupil plane array is y times larger than the actual pupil (here y= 3), and 
its Fourier transform features three pixels per unit of angular resolution, as shown in the 
zoomed image of the core (right). In this case a 4 ~ 5 resolution element mask would not 
be sufficiently sampled. A general consensus for coronagraphic calculations is to use y = 6 
or y = 8. 



Transform or chirp z-transform, [28 29 1, and we describe a very simple matrix-based method 
in the next section l30l . These partial FT methods are intrinsically slower that a FFT if exactly 
the same computation is performed. However, since we are only interested in a small number of 
points within limited areas (occulting mask and Lyot stop), we actually replace fast calculations 
with very large arrays by slow calculations with very small arrays. Note that for 7=8, only 40 x 
40 values of the FT need to be calculated in the first focal plane for a 5X/D mask, independently 
of the size of the pupil n pixels. 

The semi-analytical propagation from the pupil to the Lyot plane is performed as follows: 

• Define the pupil field amplitude without zero padding as a Na x Na array, where Na is 
the diameter of the pupil in pixels. 

• Calculate the FT to an arbitrary fine sampling inside the area limited to the occulting 
mask asaiVgX Nb array, with e.g. Nb = 40 for a 5A /D mask with y = 8. 

• Calculate the FT of this previous field amplitude inside an area limited to the pupil (Na x 
Na array). 

• Reverse the spatial axes because two successive FTs restore the function while changing 
the sign of the variable, and subtract the result from the pupil field amplitude (An inverse 
FFT can also be used between plane B and plane C to avoid reversing the axis). 

Note that for convenience all masks (pupil and occulter) are defined with their centers located 
on the pixel Na/2+1 (oiNb/2+1), and that the focal plane mask can be defined using the gray 



pixel approximation, as for the classical FFT method. These computation steps are illustrated in 
FigEJand compared to the classical FFT approach. The figure shows the actual arrays that are 
calculated for both method: note that the semi-analytical method does not use zero-padding in 
the pupil plane and calculates the focal plane amplitude only inside the occulting mask, instead 
of outside in the case of the classical FFT method. 

In plane C, the Lyot stop can be equal to the pupil size in the case of an APLC, or more gen- 
erally undersized to optimize the coronagraphic efficiency according to various possible criteria 
||9][T8). If tne Lyot stop size has been previously chosen, the calculated area can be limited to 
the Lyot stop itself to accelerate further the computation, but in practice we usually compute 
the Lyot plane amplitude over the size of the entire pupil. If necessary, it is straightforward to 
oversize the calculated region, at the expense of computing efficiency, for example to see the 
light diffracted outside the pupil. 

Avoiding the typical six-to-eight fold zero-padding enables the use of much smaller arrays, 
than FFT calculations require. For example, in the case of GPI (D = 8m), the secondary mirror 
support structures are about 1cm wide. Understanding the effects of spiders on coronagraphic 
performance is important since they appear bright in the Lyot plane and musk be masked out 
by the Lyot stop |34| . For example, with 4 pixels per spider, Na = 3200 pixels are needed in the 
pupil. Standard 7=8 padding would require 25600 x 25600 FFTs. The semi analytical method 
enables this calculation in a few seconds on a standard desktop computer (Table 2). 

It is important to note that the calculations of the partial FTs inside the mask do not corre- 
spond to an additional approximation, as it may seem that we discard the information outside 
the focal plane mask. In fact, we do not lose any information because the method is based on 
the analytical formulation of Eq. [9] which is equivalent to Eq.[3] 



4.2. Matrix direct Fourier transform 

In this section we describe a simple matrix Fourier transform (MFT), which can be used for 
the semi-analytical method. In order to restrict the computation of the FT of the pupil to the 
image plane size m expressed in resolution elements units (X/D), we choose the sampling 
step in plane B such that: du = m/Ns- We compute the Riemann sum directly using a matrix 
formulation of Eq|6l 
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\ F(u Ng ^i) J \ e 
which can be rewritten as: 

F(V)=e- 2MxT -/(X), (11) 

where U = (uq «w B -i) r , X = (xq xn a -\) t , and exp(-) is the element-wise exponential of 

a matrix. Two-dimensional FTs can be implemented straightforwardly as follows: 

• Define the four vectors U = (mo «w s -i) r , X = (xq xn a -i) t , V = (vo \'n b -i) t , 

Y = (vo yN A -i) T - 

• The vector elements are x k = y k = [k — Na/2) x 1 /Na and ui =vi = (I —Na/2) x ih/Na, 
for k=[0,...,N A - 1] and l = [0,...,N B - 1]. 

• The two-dimension FT is obtained by computing the two matrix products: 



F(U,H) 



-2ixUX' 



N A N B 



■f(X,Y)-e 



-2inY\' 



(12) 



FFT 

N = 7 iV A = 4500 



SAM 

N A = 750 N B = 30 




1000 2000 3000 4000 



100 200 300 400 500 600 700 



Fig. 3. Computing steps in each plane A,B,C using FFTs with a six-fold zero padding 
(4500 x 4500 FFTs), and the semi-analytical method (SAM), for a possible ELT geometry 
and an APLC. With SAM, only the points of the FT that are inside of the occulting mask 
are computed as opposed to the FFT method, where the points outside the mask are needed. 
In this example we obtain a speed improvement of about 15 using 36 times less memory. 



where the normalization coefficient tti^NaNb) imposes the conservation of energy ac- 
cording to the Parseval theorem |26l : the energy in the limited-area FT is a fraction of 
the total energy of the FT, corresponding to the limited area, which was calculated. 

• With this definition, the Fourier transform is centered in a similar fashion to the FFT, 
with the zero frequency at the pixel Nb/2 + 1 . 

Note that when the conditions of Eq|7]are verified, both the FFT and the matrix FT give identical 
results. This can be used as a numerical check when implementing the technique. This is not 
surprising since the matrix method is a direct implementation of the Riemann sum, which is also 
what is computed by a FFT (very efficiently), taking advantage of simplifications for particular 
sampling conditions (Eq[8]l. With the MFT, the arbitrary sampling du in the focal plane is 
obtained at the expense of partial information on the FT. Note that in the case of the MFT, y still 
corresponds to the sampling of the FT (number of pixels per resolution element), but not to the 
zero-padding coefficient. The semi-analytical formalism frees the computation from the rigid 
sampling constraints of FFT methods. This provides increased flexibility in the computation, 
since the number of pixels in the pupil and focal planes can be chosen independently. 

In terms of computation cost, the matrix FT involves 2 complex-matrix products of the form: 

F = E l -f-E 2 , (13) 

where the respective sizes of the three matrix E\ , /, E2 are: Nb x Na, Na x Na, and Nb xNa- We 
consider the case of a complex FT for generality. Each element of the result in a matrix product 
is the result of Na multiplications and Na — 1 additions. While complex addition requires 2 flops 
(floating point operations), complex multiplications require 6 flops. Since the two matrices E\ 
and E% can be calculated beforehand, in a similar fashion that FFT plans are generated, the total 
number of operations for the product f.E% is therefore: NaNb(&Na — 2) ~ SN^Nb, assuming 
Na large enough. The number of flops involved in the first complex MFT (pupil to focal) is 
therefore: 

n(MFT) = 8(nIn b +N a N^) - 2N A N B - 2N%. (14) 

In the particular case where both pupil and image plane have the same array size N, it is inter- 
esting to note that the number of operations for the matrix method is proportional to A^ 3 and not 
Af 4 as for a direct calculation of the DFT This is because the two successive matrix products 
take advantage of some redundancy in intermediate calculations. The FFT is as expected more 
efficient in this case, and the relative number of operations is: 

n(FFT) _ 51og 2 (AQ 
n(MFT) W 

For example a 2000 x 2000 FT requires 36 times more operations with a MFT than with a FFT. 
This calculation, which takes about 0.5s with a FFT on a on a 2GHz Apple G5 (1 Gflop/s), 
would take 18s with a MFT. The performance comparison between the FFT propagation method 
and the semi-analytical method is given in Sec. [5] 

4.3. Fast fractional Fourier transform 

The computation of the FT in a limited area can also be made using a Fractional Fourier Trans- 
form. This method is presented by Bailey and Swarztrauber 11281 . we refer the reader to this 
work for a complete description of this algorithm. Consider the Riemann sum in Eq. [6] and 
assume that there is no zero padding x„ = n8x — (NaSx)/2, n 6 [0,Na — 1] with 8x = D/(Na) 
and Uk = k8u — (NbSu) /2, k £ [0,Nb — 1] with 8u = 1 /{jD) and Nb = my, where m is the size 
of the image plane mask in resolution element units (X/D). Note that here again y does not 



correspond to an oversize of the pupil but directly to the number of pixels per unit of angular 
resolution in the image plane. The Riemann sum can be re-written as: 

F(u k ) = -J(.*/M*-*b/2) £ J**NbIWa) f ^ e anknl(yN A ) 
" n=0 

which corresponds to a Partial Fourier Transform of the series g„ = exp((/7T«A^) / (yN A ))f(x n ). 
This sum can be evaluated using Bluestein's technique ll35l . which is related to the chirp z- 
transform. By noticing that 2kn = n 2 + k 2 — (k — n) 2 the Partial Fourier Transform above can 
be written as a convolution: 

n n a 

F( Uk ) = —jWrW-Wl) £ e irniN B /{yN A ) f( Xn yxk 2 /(yN A ) e -in(n-k) 2 /( 7 N A ) 
N n=0 

Consequently, the computation of the field in the image plane to an arbitrary sampling can 
be computed using a discrete convolution of sequences of length Na- We direct the reader to 
the aforementioned references for details on these computation schemes. In order to use fast 
circular convolution algorithms, the incoming sequence needs to be zero-padded by factor of 
two, and the computational cost corresponds then to three FFTs of size 2N A - When the field in 
the image plane is computed up to the Nyquist limit, the computational cost of the method is 
2(Wjlog 2 (A^). When the size of the final array is Nb^Na, partial convolution algorithms can 
be used, and they reduce the computational cost to 2QN\ log 2 (iVg) ll28l . In the case of large FT 
sizes (Nb large) the fractional FT method has a lower cost than the MFT and the potential gain 
is of the order of Nb / log 2 (Nb ) ■ 

5. Performance comparison and practical applications 

5.1. Computation costs for coronagraphic propagation 

We compare the computation costs of coronagraphic propagation from pupil to Lyot plane in the 
case of the FFT and semi-analytical methods. As described before, coronagraphic propagation 
with FFTs requires pupil zero-padding by a coefficient y. The array sizes are therefore (jNa ) 2 , 
where Na is the number of pixels across the pupil diameter and the number of flops involved in 
each FFT is lO(yiV^) log2 (yi^i). The total number of operations for a propagation from pupil 
to Lyot plane is: 

n(FFT) =20{YN A ) 2 log 2 (YN A )+6(Ym) 2 , (18) 

where we also include the multiplication by the focal plane mask (6(ym) 2 flops), which can be 
neglected in most cases of interest. 

With the semi-analytical method, the size of the first FT is limited to the mask area, = 
ym where m is the mask size in resolution elements (we remind here that typically m = 5 
for a Lyot coronagraph and m = 1 for a Roddier or DZPM). The number of operations for 
the first MFT (Eq. PB1) , The second MFT transforms the Nb x Nb array into a Na x Na array 
and its cost is n(MFT) = 8(N^N B + N A N%) - 2N A N B - 2N\. The propagation from pupil to 
Lyot plane includes the two MFTs, the multiplication by the focal plane mask (6A^J flops) and 
the subtraction of the result from the pupil (2N A flops). The total cost of a semi-analytical 
propagation from pupil to Lyot plane is therefore: 

n(SAM) = \6(Nlym + N A {7mf)+A(Ym) 2 -AN A N B , (19) 

where the last two terms can be neglected in most cases of interest. The relative computation 
cost between the classical FFT propagation and the semi-analytical method is therefore approx- 
imately: 

n(FFT) _5yN A log 2 (jN A ) 



n(SAM) 4 mNA + ym 2 



(20) 



neglecting the multiplication by the focal plane mask. This expression shows that there are 
always more operations with the FFT than with the semi-analytical method, and that the relative 
cost of the FFT increases with the padding factor y and the pupil size Na- 

In FigHl Fig|5] and FigO we illustrate the gain between the semi-analytical and FFT meth- 
ods, using the ratio of their computation costs given in EqQ~8] and Eq|20| We first give some 
results for a typical Lyot coronagraph simulation on an 8-meter class telescope, and two cases 
where the computation gain can be even larger: an ELT simulation with fine pupil sampling, or 
a phase mask coronagraph where very fine sampling is required in the focal plane. 
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Fig. 4. Gain between SAM and FFT for a propagation from pupil to Lyot plane, as a func- 
tion of the number of pixels across the pupil. The gain is based on the evaluation of the com- 
putation costs for both methods. We use common zero-padding coefficients (7= 4, 6, 8) and 
a mask size in = 5 A /D. This corresponds to the regime of simulations of Lyot coronagraphs 
for 8-meter class telescopes. 
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Fig. 5. Same as Fig|4] but in the case of an ELT where a very large number of pixel is used 
in the pupil. 
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Fig. 6. Same as Fig(4) but for a Roddier or Dual Zone coronagraph with a mask size of m = 
X /D and an oversizing 7= 20. This corresponds to a maximum FFT size of 10000 x 10000 
for the 500 pixel pupil. 

5.2. The case of direct imaging 

In this section we discuss briefly the case of direct imaging as a potential application of the 
MFT. Although in most cases the MFT requires more computations than a FFT, it is possible 
to identify a few cases where it can be more efficient. This happens when the calculated area 
is limited and/or when a fine sampling is needed. The MFT was first used by Give'on et al. 
ll36l[37l for the study of wavefront sensing and correction in high-contrast imaging with shaped 
pupils 1381 . 

We illustrate the case of a point spread function (PSF) calculation with a field of view ( FOV) 
of 50 resolution elements in Figj7] In this case, both techniques have similar costs with a slight 
advantage for one or the other depending on pupil size and sampling parameter 7. If the PSF 
is only calculated to a FOV of 20 resolution elements, the MFT provides a gain by a factor of 
several over the FFT (Fig|H]l. In some applications where only the center of the PSF is needed, 
gains comparable to that of the coronagraphic cases can be obtained. 

5.3. Practical applications and implementation 

In this section we discuss the actual performance of the semi-analytical method for a specific 
computer, a 2003 Apple G5 dual processor 2Ghz computer with 4Gb RAM. We measured the 
performance of this machine using the package FFTW (version 3) ll32ll and report the results in 
Fig|9] We considered the case of double precision complex 2D arrays, which is appropriate for 
coronagraphic simulations. These results are consistent with those given on the FFTW website, 
but we extend them up to 8192 x 8192 FFT to verify that the speed does not depend strongly 
on the array size. Our calculations used the FFTW_PATIENT option, which optimizes the FFT 
plan prior to the calculation of the FFT itself. We used single-threaded FFTs although the FFTW 
packages provides the possibility of multi-threaded calculations for further speed improvement 
on multi-core machines. 

In order to evaluate the machine's performance for the MFT, we considered the Geekbench 
benchmark software [391 which reports 1.4 Gflop/s for single-threaded dot products and 2.6 
Gflop/s for multi-threaded ones. It seems that dot products are slightly more efficient than the 
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Fig. 7. Non-coronagraphic, direct imaging with FFT and MFT where the PSF is calculated 
to a field of view of 50 resolution element (25 in radius). Both MFT and FFT have similar 
costs with a slight advantage to MFTs when higher sampling is used. 
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Fig. 8. Non-coronagraphic, direct imaging with FFT and MFT where the PSF is calculated 
to a field of view of 20 resolution element (10 in radius). For a limited field of view, the 
MFT can be significantly faster than the FFT. This can be interesting for coronagraphic and 
extreme adaptive optics simulations. 



FFTW on this hardware (1.4 Gflop/s vs. 1 Gflop/s). This may be due to the easier implemen- 
tation of the dot-product which takes direct advantage of the processor's vector engine. An 
interesting aspect of the MFT is that it can take advantage of recent multi-core computers, with 
easy and efficient multi-threading of linear algebra calculations. 

High level languages such at Mathematica or Matlab are well optimized for vector computa- 
tions. For example, we obtain 2.5 Gflop/s with Mathematica for multi-threaded dot-products on 
double precision complex arrays of sizes between 1000 x 1000 and 10000 x 10000. This result 
is consistent with the benchmark results this machine (2.6 Gflop/s). In the case of a propagation 
from pupil to Lyot plane, the speed goes down to 1 Gflop/s with Mathematica because the other 
operations involved in the semi-analytical method are not fully optimized with this language. 
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Fig. 9. Normalized speed expressed in Mflop/s using the FFTW3 package 1321 on a 2Ghz 
dual-processor 2003 Apple G5 with 4Gb RAM. The normalized speed is defined as the 
number of operations divided by the execution time: ION 2 log 2 (N)/T where T is the time 
for one FFT (single-threaded, double precision complex arrays). The red dots correspond 
to power of twos arrays. This hardware delivers approximately 1 Gflop/s over the range of 
array sizes, which translates for example into 2.3s for 4096 x 4096 transforms, and 10s for 
8192x 8192 transforms. 



In Table 1, we compare the two methods for three pupil sizes (256 x 256, 512 x 512, 1024 x 
1024). Na and Nb are the number of pixels used in the calculation, in the pupil and focal planes. 
We use the same number of pixels in the Lyot plane as in the pupil. We show the effect of the 
sampling improvement from X/4D to X/8D, obtained by zero-padding the pupil with FFTs. 
Note that for the semi-analytical method, the number of pixels Na corresponds to the pupil size 
itself, and that Nb corresponds to the focal plane mask size. For the FFT, Na corresponds to the 
zero-padded pupil. The timings given in the table correspond to the time for 2 FFTs calculated 
with FFTW3, neglecting the application of the focal plane mask and zero-padding time. For the 
semi-analytical method, we give timings obtained with Mathematica as example. 

Coronagraphic simulations are usually performed with a X /6D to X /8D sampling, in order 
to keep the FFTs manageable. With the semi-analytical method, we can increase the precision 
significantly, and we typically use a X/10D or X/2QD sampling for the focal plane mask, 
including gray pixel approximation. This corresponds to a 40 ~ 50 or 80 ~ 100 pixel diameter 
mask for a 4 ~ 5 resolution element mask. 

The semi-analytical method offers a total flexibility in terms of sampling in both planes, and 
there is no particular limitation to the type of calculations. We present a few examples in Table 



Table 1. Performance comparison of FFT-based and semi-analytical methods (SAM). Na 
is the number of pixels across the pupil diameter, and Ng is the size of the array in the 
focal plane. The FFT timings correspond to 2 FFTs calculated with FFTW3, and the SAM 
timings correspond to an actual propagation from pupil to Lyot plane, with Mathematica. 



Method 


Pupil Plane 


Image Plane 


Time 




N A 


Array size 


Sampling 


Array size (Ng) 




FFT 


256 


1024 


A/4D 


1024 


0.22 s 


FFT 


256 


2048 


A/8D 


2048 


1.0s 


SAM 


256 


256 


A/8D 


40 


0.06 s 


FFT 


512 


2048 


A/4D 


2048 


1.0s 


FFT 


512 


4096 


A/8£> 


4096 


4.7 s 


SAM 


512 


512 


A/8D 


40 


0.19 s 


FFT 


1024 


4096 


A/4D 


4096 


4.7 s 


FFT 


1024 


8192 


A/8D 


8192 


20.2 s 


SAM 


1024 


1024 


A/8D 


40 


0.70 s 



2, which would hardly be achieved with FFT propagations. For example, one can choose to 
increase the sampling in the pupil, with a modest A/8D sampling in the focal plane, in order 
to study the effect of segmentation of an ELT. We show an example of a 6000 x 6000 pupil 
array. This corresponds to calculations that are typically made on very large computer clusters 
using the classical method. This would correspond to 36000 x 36000 FFTs with y = 6. It is 
also possible to oversample dramatically the occulting mask. This can be used to study the 
effect of the actual mask shape, such as ellipticity, edge roughness, fine alignment, or effects of 
atmospheric differential refraction. We give an extreme example with 1000 pixels in the pupil 
and a resolution of A /200D. This calculation would corresponds to 200000 x 200000 FFTs. 

Another application is the simulation of Roddier or Dual Zone phase masks (DZPM), where 
the masks have a typical size of A /D, making the sampling problem even more critical for 
FFTs. The optimization of DZPM |20l . which was obtained using the semi-analytical approach 
by integrating directly the Hankel transforms, can be studied by direct numerical simulations 
with our method. 

Table 2. Performance of semi-analytical simulations for double precision complex arrays, 
with their possible applications. These calculations cannot be performed using the FFT 
method on commercial workstations, as they would correspond respectively to 200000 x 
200000, 25600 x 25600, and 48000 x 48000 FFTs. 



Method 


Pupil Plane 


Image Plane 


Time 


Application 


N A 


Array size 


Sampling 


N B 


SAM 


1000 


1000 


A /200D 


1000 


13.5 s 


Mask Tolerancing 


SAM 


3200 


3200 


A/8D 


40 


6.1 s 


4 pixel/spider on Gemini 


SAM 


6000 


6000 


A/8D 


40 


22 s 


5 mm/pixel for 30 m ELT 



6. Conclusion 



In this paper we introduced a semi-analytical method to calculate the numerical propagation 
through a Lyot-style coronagraph, without any additional approximation. We showed that FFTs 
are not appropriate for Lyot coronagraphy because of fundamental sampling limitations, and 
should be avoided for these calculations. The semi-analytical method is derived straightfor- 
wardly from the analysis of the wave propagation, and requires the use of a Fourier transform 
method producing arbitrary sampling in a limited area. Several methods exist to perform such 
transforms, we suggest a matrix-based Fourier propagator that can be implemented efficiently 
in any language. 

For coronagraphic studies for ExAO on ELTs and eight meter class telescopes, the typical 
speed improvement based on computation costs is of at least a factor twenty to fifty compared to 
the classical FFT coronagraphic propagation method. In addition to speed, our semi-analytical 
method is particularly efficient in terms of memory, as it does not involve any zero padding, 
and typical memory reequirements are reduced by a factor of about 50 in our usage of the 
algorithm. In the case of a full propagation from pupil to final focal plane, if the final FT is 
performed using a FFT, the semi-analytical method is approximately three times faster than a 
FFT-based propagation, since the propagation time is entirely dominated by the final FFT. If a 
reduced field of view is calculated in the final image, which is often the case in extreme adaptive 
optics coronagraphy, an MFT can be used in lieu of an FFT. For example a factor of 5 can be 
gained on the last FT for Na = 400 and 7=10, and the overall gain with the semi-analytical 
method is a factor of 15. 

With the semi-analytical method, the pupil and focal plane sampling are completely inde- 
pendent. This flexibility enables calculations that were hitherto impossible, or only possible on 
large and expensive computer clusters. For example, the study of fine occulting mask features 
and alignment, or fine pupil structures like segmentation, is now achievable on a well-equipped 
but otherwise standard laptop, or on most commercial workstation sold today. 

The MFT may also find interesting applications where a FT has to be calculated in a limited 
area. This was used by Give' on et al. 136*1 l37l and new applications may be found in adaptive 
optics for anti-aliasing spatial filters l40l . pyramid wavefront sensing BTI . etc. 

Our method also enables fast computation of optimal apodizers for APLCs, which involves 
an iterative algorithm and makes computing efficiency even more critical 11811 . APLCs are one 
of the most promising concepts for high contrast imaging instruments on ELTs, and their study 
requires the optimization of the mask size according to various criteria, over broad spectral 
bandpasses. A complete exploration of the effects of chromaticity, segmentation and spiders on 
the coronagraph is now made possible with acceptable computing times. Our method could be 
utilized in several current efforts focussed on ground and spaced based coronagraphy. 
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